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Abstract 

Nickel (Ni) compounds are widely used in industrial and commercial products including 
household and cooking utensils, jewelry, dental appliances and implants. Occupational exposure 
to nickel is associated with an increased risk for lung and nasal cancers, is the most common cause 
of contact dermatitis and has an extensive effect on the immune system. The purpose of this study 
was two-fold: (i) to evaluate immune response to the occupational exposure to nickel measured by 
the presence of anti-glycan antibodies (AGA) using a new biomarker-discovery platform based on 
printed glycan arrays (PGA), and (ii) to evaluate and compile a sequence of bioinformatics and 
statistical methods which are specifically relevant to PGA-derived information and to 
identification of putative "Ni toxicity signature". The PGAs are similar to DNA microarrays, but 
contain deposits of various carbohydrates (glycans) instead of spotted DNAs. 

The study uses data derived from a set of 89 plasma specimens and their corresponding 
demographic information. The study population includes three subgroups: subjects directly 
exposed to Nickel that work in a refinery, subjects environmentally exposed to Nickel that live in 
a city where the refinery is located and subjects that live in a remote location. The paper describes 
the following sequence of nine data processing and analysis steps: (1) Analysis of inter-array 
reproducibility based on benchmark sera; (2) Analysis of intra-array reproducibility; (3) Screening 
of data - rejecting glycans which result in low intra-class correlation coefficient (ICC), high 
coefficient of variation and low fluorescent intensity; (4) Analysis of inter-slide bias and choice of 
data normalization technique; (5) Determination of discriminatory subsamples based on multiple 
bootstrap tests; (6) Determination of the optimal signature size (cardinality of selected feature set) 
based on multiple cross-validation tests; (7) Identification of the top discriminatory glycans and 
their individual performance based on nonparametric univariate feature selection; (8) 
Determination of multivariate performance of combined glycans; (9) Establishing the statistical 
significance of multivariate performance of combined glycan signature. 
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The above analysis steps have delivered the following results: inter-array reproducibility p=0.920 
+ 0.030; intra-array reproducibility /3=0.929 + 0.025; 249 out of 380 glycans passed the screening 
at ICC>80%, glycans in selected signature have ICC > 88.7%; optimal signature size (after 
quantile normalization)=3; individual significance for the signature glycans p=0.00015 to 0.00164, 
individual AUC values 0.870 to 0.815; observed combined performance for three glycans 
At/C=0.966, p=0.005, C7=[0.757, 0947]; specifity=94A%, sensitivity^, .9%; predictive (cross- 
validated) AUC value 0.836. 
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Introduction 



Nickel is used in large number of every-day applications including coins, jewelry, household 
and cooking utensils, orthodontic and orthopedic implants, and batteries [1]. Exposure to Ni 
as a result of daily dermal contact or through occupational or environmental sources is 
known to produce a variety of pathologies including allergic contact dermatitis [2] and 
cardiovascular diseases [3]. Long-term occupational exposure is associated with lung and 
nasal cancers [4], and an increased risk for acute respiratory syndromes such as mild 
irritation and inflammation, bronchitis, pulmonary fibrosis, asthma, and pulmonary edema 
[5]. The general population is also exposed to Nickel via ingestion, since Nickel is a 
contaminant in drinking water and is present as a component or contaminant of foods [6,7]. 
Nickel can also be secreted in human breast milk leading to early dietary exposure of infants 
[8]. 

Due to the growing evidence about their toxic effects, in 1990, certain Ni compounds have 
been classified by the International Agency for Research on Cancer (IARC) as being 
carcinogenic to humans [9]. Excellent reviews of carcinogenic activities of Nickel are given 
[10-13]. 

Understanding the mechanisms of Nickel carcinogenicity and identification of biomarkers of 
susceptibility to Nickel toxicity is needed to allow development of tests for earlier 
identification and protection of individuals with high risk for the development of Nickel- 
related diseases. 



This paper presents early results of linking Nickel exposure and presence of Nickel in 
human body to immune response measured by expressions of anti-glycan antibodies (AGA) 
using a new platform which is based on printed glycan arrays (PGA). 

PGA is a new biomarker-discovery platform, which has been developed and utilized for 
biomarker discovery in the last several years [14]. PGA has useful advantages over nucleic 
acid-based testing and other platforms including: minimal invasiveness of blood sampling, 
stability of antibodies, low cost and short turnaround time. The printed glycan arrays (PGA) 
are similar to DNA microarrays, but contain depositions of various carbohydrate structures 
(glycans) instead of spotted DNAs. Most of these glycans can be found on the surfaces of 
human normal and cancer cells, as well as on the surfaces of human infectious agents such 
as bacteria and other pathogenic microorganisms. Pathologies including infection, 
inflammation and malignant transformation are associated with the appearance of abnormal 
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glycosylation of proteins and lipids present on surfaces of altered cells in tissues and in 
circulation. 

The malignancy-related abnormal glycans are called tumor-associated carbohydrate antigens 
(TACA) [15]. There is evidence that numerous TACAs are immunogenic [16], and that the 
human immune system can generate antibodies against them. We have also demonstrated 
that the dynamics of anti-glycan antibodies detected by PGA can indicate the status of 
immune response to malignancies [17-19]. A prototype of PGA with a library of 200 glycan 
structures was built at Scripps Research Institute, La Jolla, California, under the auspices of 
the Consortium for Functional Glycomics (CFG), [20] . Further development and 
standardization of a PGA with 211 glycans was conducted at Cellexicon, Inc., La Jolla. The 
latest PGA version with a total of 392 probes, containing 380 glycans of pharmacological 
purity grade and 12 control probes was developed in the Tumor Glycome Laboratory at 
NYU SoM. The 211 and 380 PGA versions were developed in collaboration with 
Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, of the Russian Academy of 
Sciences, Moscow, Russia. 

This paper describes a sequence of data processing and analysis steps, starting with quality 
analysis of raw PGA data and ending with a putative glycan signature which provides a 
basis for identification of individuals with high concentration of Nickel in urine. To the best 
of our knowledge, this is the first study in which the systematic immune-profiling of AGAs 
using PGA in plasma of individuals with different levels of occupational exposure to Nickel 
identifies the putative immune signature of Ni toxicity. The presented results demonstrate a 
potential of PGA-based analyses of serum/plasma anti-glycan antibodies coupled with the 
described bioinformatics approach in search of the disease biomarkers. 

Methods and Materials 

Demographic and clinical data 

The study population included 3 groups of subjects: (1) Nickel refinery workers in Jinchang 
(30 cases), (2) Residents of Jinchang (30 cases) and (3) residents of adjacent city of Zhangye 
(29 cases). The latter two groups of subjects had only environmental exposures to nickel. 
The human subject protocol for this study was approved by the Institutional Review Boards 
of both the New York University School of Medicine and the Lanzhou University School of 
Public Health (IRB # 09-0726). Written informed consent was obtained from all 
participating subjects. 

The demographic information included: age, smoking status, urinary creatinine [|ig/g] and 
Urinary Ni [|ig/L]. A characteristic of the study subjects is presented in Table 1 (Additional 
information about the study population can also be found in [21].) 

Urinary Ni, used to index the individual's personal exposure to Ni, was analyzed for all 
study subjects by inductively coupled plasma mass spectroscopy (Elan DRCII; PerkinElmer, 
Norwalk, CT USA, [22]). Urinary cotinine, a major metabolite of nicotine and a valid bio- 
marker of environmental tobacco smoke, was measured in each subject to confirm smoking 
status and control its potential confounding effects. Urinary cotinine measurements were 
measured using a Cotinine Direct ELISA kit (Immunalysis, Pomona, CA, USA [23]). 

A simple preliminary analysis of demographic data shows that there is a significant 
difference in the level of Nickel in urine among directly exposed subjects in refinery and the 
subjects who are not working at refinery. For example Figure 1 shows the distributions of 
urinary Nickel concentrations among three subgroups of subjects; the Kolmogorov-Smirnov 
test of unequal distributions gives the following p-values: 0.0017 between refinery workers 
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and Jinchang residents, and 0.0073 between refinery workers and Zhanghye residents. 
However, the distributions for subjects at Jingchang and Zhanghye are significantly equal. 
In addition, Figure 2 illustrates the fact that there are a greater number of subjects within the 
refinery workers with high content of Nickel in urine [>9.98 |ig/L] as compared with the 
number of subjects in the two other study groups. It is important to notice that some 
individuals not exposed occupationally to Nickel may also have high [>9.98 [ig/L] content 
of Nickel in urine. 

Printed glycan arrays 

A printed glycan array (PGA) is used as a biomarker-discovery platform in a form of a 
"glycochip". This glycochip is generated using standard robotic nano-printing technology 
that allows printing of a large range of amine-functionalized glycans/probes on amine - 
reactive N-hydroxysuccinimide (NHS)-activated glass slides (Schott GmbH, Germany) with 
surface modified for rapid covalent coupling. Glycans are printed in concentration of 50 |iM 
generating spot sizes of ~ 70 microns. In addition to 380 glycans, 12 control probes 
including print buffer samples as reagents for background quality control and a spot- 
reference location are included in each print set of 392 probes. All these 392 probes are 
distributed within two sub-arrays, each containing replicate subarrays of 14x14=196 probes/ 
spots. Each glycochip divided into two subarrays, accommodates 24 sub-grids, with 12 
replicates for each printed glycan/probe. 

The measurement of binding of human anti -glycan antibodies (AGA) to arrayed glycans, 
also called "immunoprofiling", is achieved as described in [24]. Briefly, the glycochip is 
first incubated at 37°C with the subject's plasma diluted 1:15 in a Carrier Buffer, allowing 
the binding of plasma antibodies to arrayed glycans. Plasma IgG, IgM and IgA 
immunoglobulins bound to glycans are visualized simultaneously with the "combo" 
biotinylated secondary goat anti human IgG, IgM and IgA antibodies (Pierce Biotechnology, 
Inc., Rockford, IL), and streptavidin-Alexa 555 (Invitrogen/Molecular Probes, Carlsbad, CA). 
Fluorescence signal intensities that correspond to the binding of antibodies to glycans are 
scanned using Perkin Elmer ScanArrayG at 90% laser power, and quantified with ImaGene 
software (BioDiscovery, Inc., El Segundo, CA). The total relative fluorescence signal 
intensity values (appx. range: 1,000 - 12,000,000 Relative Fluorescence Units) are used for 
further data processing and analyses. 

The quantified images are automatically analyzed for missing, or for overflowed spots (rare 
events) which are excluded from the final summarization of replicates performed by median. 
This process has enabled robust and accurate measurements. 

Figure 3 shows an excerpt from a scanned image of a glycochip developed with plasma of 
one of the study subjects. The four subgrids represent two replicates (columns) and two 
complementary subarrays (rows) which contain all 380 glycans in our current library and 12 
control probes. 

Quality analysis and data pre-processing 

In order to begin statistical analysis of PGA data with the goal of discovering potential 
putative signatures associated with the immune-responses to Nickel exposure, it is vital to 
first address the accuracy and the reproducibility of measurements, specifically the intra- 
and inter-array reproducibility of PGA signals. The former relates to precision of 
measurement of AGA bindings within a single array, while the latter relates to between- 
array biases induced by the platform, or even by the quality of plasma. 

The first step in this quality analysis is to test the platform with the benchmark sera. The 
process of printing, development and scanning of glycochips at NYU Tumor Glycome 
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Laboratory is interleaved with immunoprofling of the test sera called here "pooled arrays". 
The test sera is prepared by pooling sera from several healthy subjects, which are then stored 
in larger quantities for usage in all similar experiments. The interleaving is performed for 
each new glycochip print batch, and in each of the immunoprofiling experiments/days. In 
this project 8 pooled arrays have been used. The scatter plots in Figure 4 show concordance 
between four typical pairs of pooled arrays out of total of 28 combinations (others were 
omitted to fit the page). The labels at the diagram axes indicate the instances of pooled sera 
arrays. The numbers above diagrams are Lin's concordance correlation coefficient [25] CCC 
(left) and the Pearson correlation coefficient PCC (right). The fact that PCC is consistently 
higher than CCC, suggests a linear inter-slide bias in location and scale. This bias can be 
mostly removed by various normalization techniques. The results for all 28 combinations of 
pooled arrays are summarized in Figure 5. 

The four diagrams in Figure 5 show Lin's Concordance Correlation Coefficient (CCC), 
Pearson Correlation Coefficient, Overall Lin's Concordance Correlation Coefficient [26,27] 
(OCCC), and Coefficient of Variation (CV). The first diagram (CCC) suggests relatively 
large bias between arrays, while the Pearson CC shows that the inter-array concordance is 
satisfyingly good. The OCCC and CV are computed independently for each array: the 
former is obtained by applying CCC to 66 combination pairs of 12 replicated subgrids on the 
same glycochip. These two diagrams show excellent intra-array concordance, i.e. small 
measurement error. All diagrams are obtained after screening out the glycans with small 
intensities, small Inter Class Correlation coefficient (ICC, discussed later) and the highly 
correlated glycans (/a>0.95). The number of glycans that have survived the screening is 249 
out of total 380 glycans in our current PGA library. 

To demonstrate the effect of inter-array bias among pooled sera, distributions of signal 
intensities are shown in Figure 6. The distributions are presented as sorted signal intensities 
for each array, across all glycans which have survived the screening. For the purpose of this 
diagram the highly correlated glycans were not screened out. In order to emphasize the 
differences, the diagram shows only 35 glycans associated with highest signal intensities of 
bound AGAs, however similar differences can be observed with all other glycans. The first 
diagram confirms the finding in Figures 4 and 5 about the inter-array bias. The second 
diagram shows the distributions after intra-array normalization (A-normalization). The 
dotted line in both diagrams shows mean distribution taken across all arrays, which is 
equivalent to distributions after quantile normalization [28,29] (Q-normalization). As 
shown, the A-normalization is not entirely effective as the Q-normalization. Other 
normalization techniques, such as between-array normalization, which is essentially 
equivalent to IQR-normalization, and sequential A-Q normalization were also considered, 
but they performed unfavorably. 

Figure 7 shows similar intensity distributions for the study data used in discriminatory 
analyses. 

The next step addresses the ICC of the measurements, i.e. the ratio between the biological 
variability across all subjects and the total variability which includes the controlled 
measurement error. The results are shown in Figure 8. The common x-axis of all diagrams in 
the figure represents glycans sorted according to decreasing values of ICC (black line). The 
ICC is estimated as in [30]. The upper diagram in blue color shows the robust version of 
ICC obtained by appropriate replacements of means and standard deviations by medians and 
median absolute deviations (MAD) respectively. The lower two diagrams (green and 
magenta) represent coefficient of variation and its robust version. The relatively high ICC 
values for most of the glycans, as well as the low CV values assure that the data in Nickel 
Exposure Study qualify for further discriminatory analyses. 
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Discriminatory samples and signature length 

After having established that the data obtained from AGA immunoprofiling qualify for 
further investigation it is necessary to define the subsamples which can be used in 
discriminatory analysis. Our original goal was to identify glycan-based immuno-signature 
that would allow identification of individuals exposed to potentially dangerous Nickel 
sources, such as occupational exposure to Nickel in Nickel refinery. We have therefore first 
compared three study subpopulations where one subpopulation included 30 workers of 
Nickel refinery, and two other subpopulations with subjects exposed to only environmental 
Nickel sources. Interestingly, this analysis did not deliver plausible "signature of hazardous 
Nickel exposure", most likely due to yet unknown factors, such as luck of adequate glycan 
probes. 

We have then started a search for differences in immunoprofiles of individuals based on 
their Urinary Nickel content regardless of their location and assignment to a study sub- 
group. A straightforward approach would be to define a simple cutoff value which separates 
individuals with low from high concentration of Nickel in urine. However, this approach did 
not yield satisfactory results, seemingly due to the adverse impact of cases with medium 
concentrations of urinary Nickel, which produced undesired clutter. The approach that has 
offered better results was to exclude the cases with medium concentrations. For example, the 
cutoff value can be the number of cases in balanced subsamples with low and high 
concentrations. Determination of this cutoff value is a matter of compromise: large value 
increases the clutter, while the small value causes loss of discriminatory power, both 
resulting in diminished statistical significance. 

For the purpose of finding the optimal solution we have run a series of bootstrap tests for 
various subsample sizes and for various signature lengths. The replicate statistic has been 
chosen to be area under the ROC curve (AUC), which has a number of desirable properties 
over discrimination accuracy, including independence from discriminant bias, good 
resolution and ranking ability [31]. The AUC is computed for a combination of normalized 
signals associated with a chosen glycan signature, i.e. set of features. The combination is 
performed through projection determined with multivariate logistic regression (MLR). Each 
bootstrap run contained 500 replications based on permutations instead of usual resampling 
with replacement, the former being more conservative. The results for quantile-normalized 
data are presented in Figure 9, which shows the achieved significance level (ASL) [32]. The 
figure suggests the optimal subsample sizes of 18 subjects. It should be noted that the same 
repeated bootstrap tests were run for other normalization approaches and for sample sizes 
smaller than 16 and larger than 20, all giving inferior results. This cutoff value implies low 
concentrations less than or equal to 4.44 |ig/L of Urinary Nickel and high concentrations 
greater than or equal to 9.98 |ig/L of Urinary Nickel. 

So determined subsamples can be now used to perform a simple univariate feature selection, 
for example the non-parametric Wilcoxon-Mann- Whitney rank-sum test, the results are 
presented in the next section. 

Once the samples of two discriminatory groups are determined, it remains to decide the 
signature length, i.e. the number of top selected glycans, which minimize the likelihood of 
over fitting. This can be done by multiple cross-validations performed for various sizes of 
signatures, then by choosing the signature size which yields the optimal cross-validated 
performance. This is shown in Figure 10, which presents the 100 times repeated 10-fold 
cross-validation. For the performance measure is again used the combined AUC value. The 
diagram shows that the optimal cross-validated AUC value can be achieved with three- 
glycan signatures, which yields the predictive AUC value 0.836, while the training 
(observed) AUC value is 0.966. In addition, the diagram in Figure 10 presents the Kuncheva 
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stability index (SI) [33], which reaches the maximum at two glycans. The reason SI drops 
after two glycans is that the third and fourth glycan (GID=\33 and 136, Figure 1 1) alternate 
in various cross-validation folds. 

The stability of feature selection can also be illustrated by the frequency of occurrences of 
each feature in total of 100x10=1000 cross-validation folds, presented in Figure 11. As seen 
the glycan GID=\9\ has been selected 100% of times, while the glycans 264 and 133 were 
selected 97% and 93% of times respectively. After the third glycan, the frequencies drop 
significantly. 

Results and Discussion 

In the previous section we have determined the subsamples associated with low and high 
level of Nickel in urine which can be now used in discriminatory analysis and in 
identification of putative glycan signature. In addition, we have determined the optimal 
signature size, which will least likely cause over fitting. 

Discriminatory analysis 

A first step in discriminatory analysis is to perform some univariate test for all glycans of 
interest. Since the PGA signals depart significantly from normal distribution (they even for 
the most glycans have multinomial distributions) we prefer to use some non-parametric test, 
such as the Wilcoxon-Mann- Whitney two-sample rank sum test. An additional benefit of 
this test is that the AUC values are directly linked with the p-values of the test. The same 
test was employed in the previous section, where the statistic used for sample selection and 
cross-validation was the AUC value. 

The test was applied to quantile-normalized PGA signals obtained by median summarized 
replicates. The result for 10 glycans with lowest p-value, or highest AUC value is shown in 
Table 2. 

The first column of the table represents the glycan identification numbers (GID). The 
corresponding glycan structures are shown in Table 3. The signs of the z-statistic indicate 
whether the PGA signals decrease (negative Z), or increase (positive Z) with the increase of 
urinary Nickel levels. The relatively high AUC values suggest high discriminatory power of 
the samples. Low values of the false discovery rate (FDR) imply a good confidence in the 
results, especially for the first three glycans, which is in compliance with the finding in 
cross-validation test. 

The sixth column of the table, AUC C , shows the cumulative AUC values obtained for 
combination of all glycans above each respective glycan. For example the cumulative value 
for combination of three top glycans, GID=\9\, 264, 133, is Ai/C c =0.966. The combination 
of glycans is performed by multivariate logistic regression. 

A convenient way to visualize the performance of the training set for the selected glycan 
signature can be achieved with the Immunoruler [34]. The immunoruler is a bar graph which 
presents the subjects with low (left bars in blue) and the subjects with high (right bars in 
magenta) urinary Nickel. The bars indicate the risk scores, which are quantified as 
probability of membership to the group of high urinary Nickel. The bars are sorted 
according to these probabilities for each group separately. The two shades of each color 
indicate the quartile regions. The risk scores are computed by combining the quintile- 
normalized signals for the signature determined above. The combination of the intensities 
associated with glycans from the signature is based on MLR. This kind of visualization 
makes apparent the number of false positives FP=l, false negatives FN=2, true positives 
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TP=16 and true negatives TN=\1, as well as specificity 5^=94.4% and sensitivity S„=88.9%. 
The training precision is 91.7% and the observed AUC value is 0.966 (Figure 12). 

The individual statistical significance of each glycan from the selected signature is evident 
from p-values and FDR shown in Table 2. 

Now we need to establish the statistical significance of the observed A UC=0. 966 obtained 
for the combined three-glycan signature. This can be achieved with a bootstrap test. In order 
to keep a conservative approach we have performed the unbiased permutation bootstrap with 
1000 replications, rather than the bootstrap based on resampling with replacement. The 
empirical probability density function under null hypothesis is presented in Figure 13. 

As shown, the two-sample confidence interval is C/=[0.757, 0.947], or one-sided upper 
bound C£/=0.935. Consequently the achieved significant level is ASL=Q.Q05, which gives a 
very strong evidence against the null hypothesis. 

Finally, the comparison of normalized, but not transformed intensities of low (diagrams at 
left) and high (diagrams at the right) urinary Nickel for selected signature glycans is 
presented in Figure 14. As shown, the intensities are decreasing in subjects with high level 
of Urinary Ni for glycan GID=191, while the effect is opposite for glycans GID=264 and 
133, which complies with the Z-values in Table 2. The bottom diagrams show the combined 
intensities. 

Association with other demographic factors 

The focus of this study was to find association of Urinary Ni with the immune response 
measured by anti-glycan antibodies (AGA). A natural question after the analysis presented 
above is whether there is a possibility of association of AGA with other demographic 
factors, such as age, smoking and creatinine, listed in Table 1 . 

Since the age of subjects is of interest we are showing the result of linear regression of the 
Urinary Ni and the AGA for the most expressed glycan in signature, GID=191, with the age 
(Figure 15). 

In order to decrease the influence of outliers we have used in the second diagram the log- 
transformed quantile normalized intensities. As shown the age does not correlate with the 
Urinary Nickel and with the AGA, at least for the data at hand (/3=0.076 and /3=-0.093, 
respectively). 

Another variable of interest would be the smoking status of subjects. Unfortunately, the 
number of non-smoking subjects is only 15 versus 74 smokers, which makes a proportion of 
smokers 5:1. The small nonsmoking group and high sample imbalance makes it difficult to 
draw any plausible conclusion. 

Yet another variable from the list is creatinine. The bottom diagram of Figure 15 indicates 
that the Urinary Ni and creatinine are relatively highly correlated considering the variation 
in measured data (/3=0.761), suggesting that these factors are potentially interchangeable in 
our analysis. Therefore we have used Urinary Ni instead of creatinine which has offered 
slightly better performance in terms of stability of feature selection and statistical 
significance of combined AUC value. This however needs to be further investigated in a 
future study with larger samples. 
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Nickel exposure, glycosylation and AGA 

The direct link between the nickel exposure, aberrant glycosylation and anti-glycan 
antibodies has not yet been established. Cellular glycosylation is a highly dynamic process 
carried out by a concerted action of hundreds of glycosyltransferases, glycosidases and other 
proteins, and it is most likely that any nickel-related molecular damage on the nucleic acid 
level that results in cellular malignant transformation will also result in aberrant 
glycosylation. Salnikow and Kasprzak [35] discuss the direct effects of the chronic exposure 
to nickel on altering glycosylation and assembly of the extracellular matrix components as 
well as assembly and function of surfactants and complement by depleting intracellular 
ascorbate. One of the mechanisms of nickel on the human innate immunity has recently been 
elucidated by the elegant demonstration that nickel directly triggers human Toll-like 
receptor 4 (TLR4) and Pattern Recognition Receptor (PRR) signaling which results in the 
expression of multiple proinflammatory genes [36]. It is therefore likely that the expected 
modifications of glycosylation on the cell surface and in the extracellular matrix participate 
in the immune response that then targets molecules with the aberrant glycosylation pattern. 

The putative Ni-toxicity AGA-based signature identified here brings novel, and potentially 
very important, findings which reveal a significance of glycans some of which are yet 
known to the field of the disease biomarkers. For example, phosphorylated LacNAc, 6-P- 
Gal[31-4GlcNAc(3 (GID 191), a main discriminatory glycan in the putative Nitoxicity 
signature, has not been reported as a component of the glycome. This glycan has been 
synthesized in the laboratory of Prof. Nicolai Bovin as an analog of 6-O-Su- 
Gaipi-4GlcNAc(6 / -0-Sulfate-LacNAc) to study the specificity and effect of charge versus 
moiety structure in antibody recognition of Garpl-4GlcNAcP and Gal(31-4GlcNAcf$- 
containing negatively charged carbohydrates. To our surprise, while 6-P-Garpl-4GlcNAcP 
has shown major decrease in antibody bindings in subjects with high versus low urinary Ni, 
its sulfated analog 6-0-Su-Gaipi-4GlcNAc did not show any significant differences in 
antibody binding intensities between these two groups. We have also not found any 
significant correlation of antibody signal binding between 6-P-Gaipi-4GlcNAcP and other 
glycans present on our PGA. Because of the well-known toxic metal-chelating properties of 
phosphate, it is tempting to hypothesize that the antibodies binding on the PGA to this 
phosphorylated glycan, are in fact involved in the clearance of Nichelating phosphorylated 
and glycosylated macromolecules. 

While N-acetyllactosamies, in particular poly-N-acetyllactosamies consisting of repeated 
units of Gaipi-4GlcNAc (GID 384), have been well-recognized as Tumor Associated 
Carbohydrate Antigens [37-39], association of Gaipi-3GlcNAc or Lewis c (a disaccharide in 
GID 379) with malignant transformation is much less known. We have recently found 
antibodies differentially binding several glycans containing Lewis c disaccharide in sera of 
patients presenting with Non-Small Cell Lung Carcinoma and Malignant Pleural 
Mesothelioma as compared with sera of control subjects. The Lewis c glycans appeared in 
the putative signatures of these tumors, and we are currently investigating the significance of 
this glycan in malignant transformation. 

Highly correlated, differential antibody binding to several lactose (Gaipi-4Glc) glycans 
containing glycine or two amino acids as a spacer is a novel finding, and the true antibody 
target remains unknown. These molecules probably mimic aberrant molecular patterns on 
malignant cell surface where glycosphingolipids (inner Lac) play a significant role [39,40]. 

Based on the presented results, we hypothesize that the elevated level of nickel in urine 
signals a certain type of cellular and possibly systemic damage which is reflected by the 
differential expression of specific anti-glycan antibodies. The biological targets that include 
the glycans identified here by the antibody recognition remain unknown. Altogether, the 
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findings reported here could lead to the further investigations of the mechanisms and 
biomarkers of an individual's susceptibility to the nickel toxicity. 

Conclusion 

The goal of this study was to investigate the evidence of immune -response in workers of a 
Nickel refinery to hazardous levels of Nickel in their work place and the immune-responses 
reflecting elevated levels of Urinary Ni, as measured by anti-glycan antibodies (AGA). For 
this purpose we have evaluated immunoprofiles of plasma specimens from 89 subjects, 
some directly exposed to high levels of airborne Nickel in a flash-smelting workshop of 
nickel refinery, and some exposed to only environmental sources of Nickel. The plasma 
specimens were processed using the new high-throughput platform based on printed glycan 
arrays (PGA) in the form of a glycochip developed in the Tumor Glycome Laboratory of 
NYU School of Medicine. The extensive quality analysis has shown that data obtained from 
PGAs qualify for subsequent discriminatory analysis: the mean Pearson inter-array 
concordance coefficient for test sera vas p=0.921 + 0.03, proving a good platform inter-array 
concordance, and the mean coefficient of variation across PGA replicates CV=19.8 + 2%, 
proving good intra-array reproducibility. Similar analysis performed on actual Nickel data 
(89 arrays) has resulted in CV=21.6 + 4.9%, and inter class correlation coefficient (ICC) 
larger than 85% for 260 glycans on the chip (> 88.7% for glycans found in the signature). 
The successful feature selection has been achieved after the entire collection of Nickel 
specimens were divided into two groups, one containing subjects with low (< 4.44 ng/L) and 
the other containing subjects with high (>9.98 \xg/L) level of urinary Nickel. The screened 
and the quantile-normalized PGA signals were then used in univariate feature selection 
based on non-parametric Wilcoxon-Mann- Whitney rank-sum test, which has suggested the 
signature GID=\9\, 264 and 133. The number of glycans in the signature is limited to three 
glycans to avoid over-fitting, as determined by an unbiased 100 times 10-fold cross- 
validation test applied to various sizes of signatures. The signals associated with the 
signature can be combined using projection based on multivariate logistic regression, thus 
forming a single discriminative marker. The observed AUC value for this marker is 0.966. 
The statistical significance of this result has been confirmed with the permutation bootstrap 
test with 1000 repetitions which has provided a strong evidence against the null hypothesis 
at achieved significance level A5£=0.005. 

The work presented in this paper entails additional investigation, such as: 

• Correlation (regression) of Urinary Ni concentration with the combined 
discriminatory marker and other confounding factors; for example, length of 
exposure and subjects' other demographic and clinical information such as age, 
smoking, inflammation, presence of other diseases, etc. 

• Investigation of the impact of direct exposure to airborne Nickel on the level of 
antibodies against glycans as opposed to the environmental exposure. 

• Linking the urinary Nickel levels with the potential clinical presentation of disease 
symptoms. A wide range of urinary Nickel levels in refinery workers with a long- 
term, direct occupational exposure to Nickel suggests significant differences in 
individual biological responses to this occupational carcinogen. 

All these investigations require stratification of the already small cohort, which would 
significantly reduce the operative sample sizes, thus lowering the statistical significance and 
the plausibility of inference. Therefore much larger study cohorts are required. In addition, 
more extensive clinical and demographic information is also needed, including the follow- 
up health status information of the study subjects. 
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Figure 1. 

Probability density functions of Urinary Ni in subjects from three locations. The 
distributions show that a significantly larger number of workers of the refinery have a 
significantly higher content of Nickel in urine as compared with the subjects at the two other 
locations, with no occupational exposure to Nickel. 
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Figure 2. 

Number of subjects with low and high content of Urinary Ni within three groups: workers of 
the refinery located in Jinchang, residents of Jinchang who are not working in refinery and 
residents of a remote city of Zhanghye. The subjects of the former group are directly 
exposed to Nickel, while the subjects of the latter two groups are only environmentally 
exposed to Nickel. 
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Figure 3. 

Excerpt from a scanned image of a glycochip developed with plasma of one of the study 
subjects. The microarray contains 392 probes replicated 12 times and arranged into 24 
subarrays. The 392 probes contain a library of 380 glycans printed in concentration of 50 
|iM, and 12 control probes. The glycans are distributed into two subarrays, each containing 
196 probes. Each sub array contains 14 by 14 spots. The figure shows two replicated sub 
grid pairs. 
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Figure 4. 

Concordance log-log scatter plots for Nickel Exposure Study. The plots are part of inter- 
slide quality control which uses slides developed on different dates and on different PGA 
print batches with benchmark sera obtained from a pool of four healthy subjects. In this 
study there were 8 such slides whose development was interleaved with that of study 
subjects, resulting in 28 combination pairs. The scatter plots show logarithms of raw 
intensity signals. Numbers above plots are Lin's concordance correlation coefficient (CCC, 
left) and Pearson correlation coefficient (right). The latter coefficient ignores linear bias in 
scale and location, while the former shows the true differences. The figure presents only 4 
combinations out of 28, with patterns typical for this study; correlation coefficients for all 28 
combinations are shown in Figure 5. 
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Figure 5. 

Platform reproducibility for the Nickel Exposure Study. The diagrams show Lin's 
concordance correlation coefficient (CCC), Pearson's correlation coefficient (PCC), overall 
Lin's concordance correlation coefficient (OCCC), and coefficient of variation of replicates 
averaged across all glycans in array (CV). The CCC and PCC are computed for 28 
combination pairs of 8 slides developed with benchmark sera. The OCCC and CV are 
computed independently for each array: the former is obtained by applying CCC to 66 
combination pairs of 12 replicated subarrays from the same PGA. The first diagram (CCC) 
suggests marginally large bias between slides (20% trimmed mean across all combinations 
is 0.8), while the PCC shows that the inter-array concordance is satisfyingly good if we 
ignore the inter-array bias, which can be achieved with proper normalization. The last two 
slides show excellent intra-array concordance, i.e. small measurement error. 
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Figure 6. 

Distribution of intensities for 8 arrays of benchmark sera before and after normalization. The 
distributions are presented as sorted medium summarized intensities across all glycans 
which have survived the screening of low intensity and low ICC glycans. In order to 
emphasize the differences, the diagram shows only 35 glycans which correspond to highest 
intensities, however similar differences can be observed with all other glycans. The upper 
diagram confirms the finding in figures 4 and 5 about the inter-array bias. The lower 
diagram shows the distributions after intra-array normalization. The quantile normalization 
as much stronger than inter-array normalization (not shown here) would make distributions 
of all subjects the same and equal to the mean value (represented by the black dotted line in 
both diagrams). 
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Figure 7. 

Distribution of intensities for subjects in the Nickel Exposure Study before and after 
normalization. The diagram is similar to the diagram in Figure 6, only it contains 89 arrays 
and the range of glycans is doubled. As seen, the intra-array normalization might not be 
entirely effective; therefore the quantile normalization is used in further discriminatory 
analysis. 
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Figure 8. 

Intra-class Correlation Coefficient (ICC, black line) estimated for the entire PGA library. 
There are two ICC curves: the traditional (black curve) and the robust ICC (blue curve), the 
latter is based on the medians and median absolute deviations as opposed to means and 
standard deviations. The traditional ICC values are sorted by descending values of ICC, 
while all other curves in figure are rearranged to reflect the same glycan ordering. As shown, 
198 glycans have an ICC value greater than 90%. The diagram also shows the rearranged 
coefficient of variations: the traditional CV (CD-std, green) and the robust version of CV 
(CV-mad, magenta). The relatively high ICC values for most of the glycans, as well as the 
low CV values assure that data in the Nickel Exposure Study qualify for further 
discriminatory analysis as far as the technical noise is concerned. 
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Figure 9. 

Unbiased bootstrap tests performed on various sizes of low and high Urinary Ni samples 
ranging from 16 to 20, and various signature sizes ranging from 1 to 6. The statistic used for 
bootstrapping is combined AUC. The combined AUC is based on projection of features by 
multivariate logistic regression. The resampling was based on 500 permutations, as opposed 
to resampling with replacement, the former being more conservative. The criterion for 
selection of the best normalization method and the best sample cutoff value was the 
Achieved Significance Level of the test (ASL). The experiments were performed for several 
normalization methods but the diagram shows only the quantile normalization, which has 
delivered the best performance. As shown, the best result was obtained for cutoff value of 
18, which yielded ASL < 0.01, thus giving very strong evidence against the null hypothesis. 
This cutoff value is used throughout this report. 
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Figure 10. 

Cross-validation of low and high urinary Nickel samples each having 18 observations. The 
diagram shows cross -validated and combined training AUC values for various signature set 
sizes ranging from 1 to 7. The combined AUC values were obtained by multivariate logistic 
regression. The algorithm is an unbiased, 100 times repeated, 10-fold cross-validation test. 
As shown, the optimal result is obtained for three glycans in the signature, resulting in cross- 
validated AUC = 0.836, while the training (observed) AUC value is 0.966. The diagram also 
shows the Kuncheva stability index (SI), which reaches the maximum at two glycans. The 
reason SI drops after two glycans is the fact that the third and fourth glycans (GID =133 and 
136, see figure 1 1) alternate in various cross-validation folds. 
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Figure 11. 

Feature count in 1000 feature selections (100 repetitions, 10 folds each repetition). The 
diagram shows that the glycan GID =191 has been selected 100% of times, while the 
glycans 264 and 133 were selected 97% and 93% times, respectively. This diagram is 
another manifestation of feature selection stability. 
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Figure 12. 

Immunoruler diagram showing the training risk scores for 18 subjects with low (< 4.44 (xg/ 
L) and 18 subjects with high (>9.98 jig/L) urinary Nickel. The risk scores are obtained by 
projecting the quantile normalized, median-summarized intensities which correspond to 
glycan signature GID = 191, 264, 133, using multivariate logistic regression. The projection 
bias is determined under the assumption of equal cost of false positive and false negative 
rates. In order to facilitate the interpretation of data, the scores are sorted in ascending order 
for each sample and colored accordingly: low Urinary Ni in blue (left bars), the high Urinary 
Ni in magenta (right bars). Bars with different color shades represent quartile regions. The 
bar intensities correspond to the probability of belonging to the high urinary Nickel group, 
given the training data. This kind of visualization explicitly shows the number of false 
positives FP = 1, false negatives FN = 2, true positives TP = 16, and true negatives TN = 17, 
all obtained using the cutoff value 0.5. Consequently, specificity is S p = 94.4% and 
sensitivity S n = 88.9%. The training precision is 91.7% and the observed AUC value is 
0.966. 
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Figure 13. 

The bootstrap test for low and high urinary Nickel samples (each of size 18). The test claims 
a strong statistical significance of the observed combined AUC value of 0.966. The 
bootstrap statistic used is combined AUC value obtained by multivariate logistic regression 
applied to three top selected glycans. Selection of glycans in each bootstrap iteration is 
performed by Wilcoxon ranking. The test is performed with 1000 resampling by 
permutation and the resulting test p-value is ASL = 0.005. The diagram shows the empirical 
distribution under the null hypothesis that the observed AUC value is no larger than any 
other replicated value. The null distribution has two-sided confidence interval CI= [0.757, 
0.947], or one-sided upper bound CU = 0.935. As shown, the observed value is beyond both 
confidence limits. The empirical data is fitted with a Generalized Extreme Value 
distribution, which has offered the same p-value as the count of replications above the 
observed value. 
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Figure 14. 

Sorted intensities for 18 subjects with low Urinary Ni (diagrams at left, blue bars) and for 18 
subjects with high Urinary Ni (diagrams at right, pink bars) for three selected discriminatory 
glycans. The intensities are obtained by quantile normalization of median summarized 
intensities. As shown, the intensities are decreased in subjects with high level of urinary 
Nickel for glycan GID = 191, while the effect is opposite for glycans GID = 264 and 133. 
The bar graphs at the bottom show intensities combined with logistic regression. The 
relative intensity scale factor for each diagram is 10 6 . 
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Figure 15. 

Linear regression of Urinary Ni with age (top diagram), quantile-normalized and log 
transformed AGA for GID =191 with the age (middle diagram), and Urinary Ni with 
creatinine (bottom diagram). The diagrams show R-squared values, regression coefficients 
with their p-values and the Pearson correlation coefficient. The low Pearson correlation 
coefficients in the first two diagrams indicate that there is no association of Urinary Ni or 
AGA with the age of subjects in this study. The relatively high correlation between Urinary 
Ni and creatinine suggests that these factors are potentially interchangeable in our analysis. 
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Table 1 

Demographic Characteristics of Participating Subjects *. 





Zhangye 
Residents 


Jinehang 
Residents 


Ni Refinery 
Workers 


P value 




N 


29 


30 


30 




Age (years) 


43.4 ± 4.9 


41.6 + 6.3 


43.3 + 4.9 


>0.05 


Smokers [n (%)] 


25 (86.2) 


25 (83.3) 


25 (83.3) 


>0.05 


Urinary Ni 
Hg/L 


6.83 ± 3.53 


6.55 + 3.51 


8.43 + 3.22 


0.0246 


/-tg/g creatinine 


4.15 ± 1.52 


4.13 + 2.05 


5.79 + 2.08 


0.0002 



all participating subjects were male 
comparison between Ni refinery workers and control subjects (residents in Zhangye and Jinehang) 
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Table 2 

Wilcoxon-Mann-Whitney two-sample rank sum test applied to screened, quantile-normalized median 
summarized data from the Nickel Exposure Study. The samples contain 18 subjects with high (> 9.98 |ig/L) 
and 18 subjects with low (<4.44 |ig/L) level of urinary Nickel. The meaning of columns are as follows: GID - 
glycan identification number, Z- z-statistic, p - p-value of the test, FDR - false discovery rate, AUC - area 
under the ROC curve, AUCq - cumulative AUC value obtained by combining the above glycans through 
multivariate logistic regression, ICC - corresponding Intra-class Correlation Coefficient computed for raw 
data. The sign of the z-statistic indicates downregulation (negative sign) or upregulation (positive sign) of 
normalized signals. The low values of FDR, at least for top three glycans, imply a good confidence in the 
results. The glycans in the table are sorted by ascending order of the p-value. The sixth column suggests that 
combining several glycans can considerably increase the AUC value. For example, combining three top 
glycans: GID = 191, 264, 133, gives AUCq = 0.966. Figure 13 shows a solid statistical significance for this 
AUC value. 



GID 


Z 


P 


FDR 


AUC 


AUCc 


ICC 


191 


3.797 


0.00015 


0.0181 


0.8704 


0.8704 


93.3 


264 


-3.497 


0.00047 


0.0353 


0.8457 


0.9167 


89.1 


133 


-3.149 


0.00164 


0.0584 


0.8148 


0.9660 


88.7 


136 


-3.058 


0.00223 


0.0602 


0.8117 


0.9722 


82.2 


135 


-2.959 


0.00309 


0.0756 


0.7932 


0.9784 


93.3 


384 


-2.927 


0.00342 


0.1490 


0.7901 


0.9815 


94.5 


134 


-2.897 


0.00377 


0.0668 


0.7901 


1.0000 


81.7 


93 


-2.864 


0.00418 


0.0629 


0.7840 


1.0000 


91.5 


379 


-2.864 


0.00419 


0.1186 


0.7840 


1.0000 


94.5 


137 


-2.792 


0.00524 


0.0807 


0.7932 


1.0000 


88.9 
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Table 3 

Structures of glycans from Table 2. 



GID 


Glycan Structure 


Generic Name 


191 


6-O-P-Gaip 1 -4GlcNAcP 


6'P-LacNAc 


264 


Gaipi-4Gaipi-4GIcNAcP 


Gaip4LacNAc 


133 


Gaipi-4GlcP-NH<-Gly^Ala 


Lactose-Gly-Ala 


136 


Gaipi-4GlcP-NH^Gly<-Ile 


Lactose-Gly-Ile 


135 


Gaipi-4GlcP-NH^Gly<-Asn 


Lactose-Gly-Asn 


384 


Gaipi -4GlcNA C pi -3Gaip 1 -4GlcNA C p 


LacNAcP3LacNAc 


134 


Gaipi-4GlcP-NH<-Gly^Arg 


Lactose-Gly-Arg 


93 


Gaipi-4GlcP-NH<-Gly 


Lactose-Gly 


379 


Gaipi -3GlcNAcpi -3Gaip 1 -4GlcNA C p 


LewisC[i3LacNAc 


137 


Gaipi-4GlcP-NH^Gly<-Nle 


Lactose-Gly-Nle 



6'P-LacNAc: 6-phospate N- acetyl lac to samine 
GalfMLacNAc: Galactose^ 1 -4N-acetyllactosamine 
sp4: glycine 
aa: aminoacid 

LacNAcfi3LacNAc: N-acetyllactosamine(31-3N-acetyllactosamine 
Lew/5^©Z^cA^c:Lewis^(31-3N-acetyllactosamine 
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